Skip to content

Conversation

@sebastiantia
Copy link
Collaborator

This PR adds the DefaultEngine::write_parquet_from_filtered_batches. This is introduced to allow for:

  1. Round-trip testing of the incoming single-file checkpoint write support (feat: add Snapshot::checkpoint() & Table::checkpoint() API #797) requires a way to write the checkpoint data returned by the engine - an iterator of FilteredEngineData batches. For more information: (link to issue)
  2. Extension: A checkpoint API on the DefaultEngine which orchestrates the entire checkpointing process, from data creation, to finalization.

Other changes:

  • write_parquet_from_batches: Refactors the writing logic from write_parquet to support streaming data by accepting an iterator of batches instead of a single batch. This is leveraged by write_parquet and the new write_parquet_from_filtered_batches

  • write_parquet: Functionally the same. Simply refactored to call write_parquet_from_batches under the hood.

@codecov
Copy link

codecov bot commented Apr 24, 2025

Codecov Report

Attention: Patch coverage is 82.35294% with 27 lines in your changes missing coverage. Please review.

Project coverage is 85.10%. Comparing base (635dca1) to head (b995734).
Report is 14 commits behind head on main.

Files with missing lines Patch % Lines
kernel/src/engine/default/parquet.rs 81.63% 5 Missing and 22 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #887      +/-   ##
==========================================
+ Coverage   84.98%   85.10%   +0.11%     
==========================================
  Files          84       85       +1     
  Lines       20849    21151     +302     
  Branches    20849    21151     +302     
==========================================
+ Hits        17718    18000     +282     
+ Misses       2241     2238       -3     
- Partials      890      913      +23     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions github-actions bot added the breaking-change Change that require a major version bump label Apr 24, 2025
@sebastiantia sebastiantia removed the breaking-change Change that require a major version bump label Apr 24, 2025
@github-actions github-actions bot added the breaking-change Change that require a major version bump label Apr 24, 2025
@sebastiantia sebastiantia marked this pull request as ready for review April 24, 2025 22:39
@sebastiantia sebastiantia requested a review from nicklan April 24, 2025 23:59
@sebastiantia sebastiantia removed the breaking-change Change that require a major version bump label May 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant